Improving the Scalability of Reduct Determination in Rough Sets

نویسنده

  • Shahid Mahmood
چکیده

Rough Set Data Analysis (RSDA) is a non-invasive data analysis approach that solely relies on the data to find patterns and decision rules. Despite its noninvasive approach and ability to generate human readable rules, classical RSDA has not been successfully used in commercial data mining and rule generating engines. The reason is its scalability. Classical RSDA slows down a great deal with the larger datt;l sets and takes much longer times to generate the rules. This research is aimed to address the issue of scalability in rough sets by improving the performance of the attribute reduction step of the classical RSDA which is the root cause of its slow performance. We propose to move the entire attribute reduction process into the database. We defined a new schema to store the initial data set. We then defined SOL queries on this new schema to find the attribute reducts correctly and faster than the traditional RSDA approach. We tested our technique on two typical data sets and compared our results with the traditional RSDA approach for attribute reduction. In the end we also highlighted some of the issues with our proposed approach which could lead to future research.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Feature ranking in rough sets

We propose a novel feature ranking technique using discernibility matrix. Discernibility matrix is used in rough set theory for reduct computation. By making use of attribute frequency information in discernibility matrix, we develop a fast feature ranking mechanism. Based on the mechanism, two heuristic reduct computation algorithms are proposed. One is for optimal reduct and the other for app...

متن کامل

Mushroom Plant Analysis through Reduct Technique

The issues of Real World are Very large data sets, Mixed types of data (continuous valued, symbolic data), Uncertainty (noisy data), Incompleteness (missing, incomplete data), Data change, Use of background knowledge etc. Lot of knowledge related to the application can be generated through these large data sets. Rough set is the methodology which can be used to deduce rules from these data sets...

متن کامل

Mushroom Plant Analysis through Reduct Technique

The issues of Real World are Very large data sets, Mixed types of data (continuous valued, symbolic data), Uncertainty (noisy data), Incompleteness (missing, incomplete data), Data change, Use of background knowledge etc. Lot of knowledge related to the application can be generated through these large data sets. Rough set is the methodology which can be used to deduce rules from these data sets...

متن کامل

Fuzzy rough set based incremental attribute reduction from dynamic data with sample arriving

Attribute reduction with fuzzy rough set is an effective technique for selecting most informative attributes from a given realvalued dataset. However, existing algorithms for attribute reduction with fuzzy rough set have to re-compute a reduct from dynamic data with sample arriving where one sample or multiple samples arrive successively. This is clearly uneconomical from a computational point ...

متن کامل

A New Rough Sets Model Based on Database Systems

Rough sets theory was proposed by Pawlak in the 1980s and has been applied successfully in a lot of domains. One of the major limitations of the traditional rough sets model in the real applications is the inefficiency in the computation of core and reduct, because all the intensive computational operations are performed in flat files. In order to improve the efficiency of computing core attrib...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011